Analyzing Catastrophic Backtracking Behavior in Practical Regular Expression Matching

نویسندگان

  • Martin Berglund
  • Frank Drewes
  • Brink van der Merwe
چکیده

We develop a formal perspective on how regular expression matching works in Java1, a popular representative of the category of regex-directed matching engines. In particular, we define an automata model which captures all the aspects needed to study such matching engines in a formal way. Based on this, we propose two types of static analysis, which take a regular expression and tell whether there exists a family of strings which makes Java-style matching run in exponential time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Semantics of Atomic Subgroups in Practical Regular Expressions

Most regular expression matching engines have operators and features to enhance the succinctness of classical regular expressions, such as interval quantifiers and regular lookahead. In addition, matching engines in for example Perl, Java, Ruby and .NET, also provide operators, such as atomic operators, that constrain the backtracking behavior of the engine. The most common use is to prevent ne...

متن کامل

Efficient Submatch Extraction for Practical Regular Expressions

Internal Posting Date: March 6, 2012 [Fulltext]  Efficient Submatch Extraction for Practical Regular Expressions Stuart Haber, William Horne, Pratyusa Manadhata, Miranda Mowbray, Prasad Rao HP Laboratories HPL-2012-41R1 regular expressions; submatch extraction; capturing groups A capturing group is a syntax used in modern regular expression implementations to specify a subexpression of a regul...

متن کامل

Semantics, analysis and security of backtracking regular expression matchers

Regular expressions are ubiquitous in computer science. Originally defined by Kleene in 1956, they have become a staple of the computer science undergraduate curriculum. Practical applications of regular expressions are numerous, ranging from compiler construction through smart text editors to network intrusion detection systems. Despite having been vigorously studied and formalized in many way...

متن کامل

The Formal Semantics of Rascal Light

Rascal [4] is a programming language that aims to simplify software language engineering tasks like defining syntax, analyzing and transforming programs, and generating code. The language provides many high-level features including native support for collections (lists, sets, maps), algebraic data-types, powerful pattern matching operations with backtracking, and high-level traversals supportin...

متن کامل

Static Analysis for Regular Expression Exponential Runtime via Substructural Logics

Regular expression matching using backtracking can have exponential runtime, leading to an algorithmic complexity attack known as REDoS in the systems security literature. In this paper, we present a static analysis that detects whether a given regular expression can have exponential runtime for some inputs. The analysis works by forming powers and products of transition relations and thereby r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014